Interesting-Phrase Mining for Ad-Hoc Text Analytics
نویسندگان
چکیده
Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles.
منابع مشابه
Scalable Phrase Mining for Ad-hoc Text Analytics
Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize ...
متن کاملPipelines for Ad-hoc Large-scale Text Mining
Pipelines for Ad-hoc Large-scale Text Mining Today’s web search and big data analytics applications aim to address information needs (typically given in the form of search queries) ad-hoc on large numbers of texts. In order to directly return relevant information instead of only returning potentially relevant texts, these applications have begun to employ text mining. The term text mining cover...
متن کاملMining Interesting Aspects of a Product using Aspect-based Opinion Mining from Product Reviews (RESEARCH NOTE)
As the internet and its applications are growing, E-commerce has become one of its rapid applications. Customers of E-commerce were provided with the opportunity to express their opinion about the product on the web as a text in the form of reviews. In the previous studies, mere founding sentiment from reviews was not helpful to get the exact opinion of the review. In this paper, we have used A...
متن کاملDistributed Analytics over Web Archives
Evolving content of the Web is being accumulated over time into Web archival collections. This creates the need for time travel search to explore the dynamics of the content. Text analytics has also a key role in exploring interesting information in text collections. Moreover, frequent phrase mining, a special case of text analytics, is an important analytical task that is motivated by the need...
متن کاملBiTeM at CLEF eHealth Evaluation Lab 2016 Task 2: Multilingual Information Extraction
BiTeM/SIB Text Mining (http://bitem.hesge.ch/) is a University research group carrying over activities in semantic and text analytics applied to health and life sciences. This paper reports on the participation of our team at the CLEF eHealth 2016 evaluation lab. The processing applied to each evaluation corpus (QUAREO and CépiDC) was originally very similar. Our method is based on an Automatic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- PVLDB
دوره 3 شماره
صفحات -
تاریخ انتشار 2010